SGD Learns the Conjugate Kernel Class of the Network
نویسنده
چکیده
We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to learn, in polynomial time, a function that is competitive with the best function in the conjugate kernel space of the network, as defined in Daniely et al. [13]. The result holds for log-depth networks from a rich family of architectures. To the best of our knowledge, it is the first polynomial-time guarantee for the standard neural network learning algorithm for networks of depth more that two. As corollaries, it follows that for neural networks of any depth between 2 and log(n), SGD is guaranteed to learn, in polynomial time, constant degree polynomials with polynomially bounded coefficients. Likewise, it follows that SGD on large enough networks can learn any continuous function (not in polynomial time), complementing classical expressivity results. ∗Google Brain ar X iv :1 70 2. 08 50 3v 2 [ cs .L G ] 1 9 M ay 2 01 7
منابع مشابه
Online learning of positive and negative prototypes with explanations based on kernel expansion
The issue of classification is still a topic of discussion in many current articles. Most of the models presented in the articles suffer from a lack of explanation for a reason comprehensible to humans. One way to create explainability is to separate the weights of the network into positive and negative parts based on the prototype. The positive part represents the weights of the correct class ...
متن کاملGeneral Hardy-Type Inequalities with Non-conjugate Exponents
We derive whole series of new integral inequalities of the Hardy-type, with non-conjugate exponents. First, we prove and discuss two equivalent general inequa-li-ties of such type, as well as their corresponding reverse inequalities. General results are then applied to special Hardy-type kernel and power weights. Also, some estimates of weight functions and constant factors are obtained. ...
متن کامل"Oddball SGD": Novelty Driven Stochastic Gradient Descent for Training Deep Neural Networks
Stochastic Gradient Descent (SGD) is arguably the most popular of the machine learning methods applied to training deep neural networks (DNN) today. It has recently been demonstrated that SGD can be statistically biased so that certain elements of the training set are learned more rapidly than others. In this article, we place SGD into a feedback loop whereby the probability of selection is pro...
متن کاملA conjugate gradient based method for Decision Neural Network training
Decision Neural Network is a new approach for solving multi-objective decision-making problems based on artificial neural networks. Using inaccurate evaluation data, network training has improved and the number of educational data sets has decreased. The available training method is based on the gradient decent method (BP). One of its limitations is related to its convergence speed. Therefore,...
متن کاملSGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data
Neural networks exhibit good generalization behavior in the over-parameterized regime, where the number of network parameters exceeds the number of observations. Nonetheless, current generalization bounds for neural networks fail to explain this phenomenon. In an attempt to bridge this gap, we study the problem of learning a two-layer over-parameterized neural network, when the data is generate...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017